5/21/2019

Agenda

  • topic 1
  • topic 2
  • topic 3

History of the US Census

US Census Timeline

  • bullet 1
  • bullet 2

Data Visualization in the 1800s

The Statistical Atlas

In fact, up until recently, the Statistical Atlas had been published and released for each Census since 1870! A large compilation of data visualizaitons based on census data:

1920 Statistical Atlas

Delaware Population (1920)

USA Population Density (1920)

Migration by State (1920)

Nearly 100 years later, computers make this task MUCH easier…

Tidycensus Overview

Variable Search

Load Var Function

  • getting api key

What is Tidy Data

All of the tidyverse packages operate easily when you have data in this structure!

Three interrelated rules:

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structure

Steps Through The Analysis

Load Libraries

library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.1     ✔ dplyr   0.8.1
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(sf)
## Linking to GEOS 3.7.2, GDAL 2.4.1, PROJ 6.0.0
library(tigris)
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
## 
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
## 
##     plot
library(tidycensus)

Your Choices!

  1. Get an API Key from http://api.census.gov/data/key_signup.html
census_api_key("<YOUR API KEY>")
demo_variables <- # define the variables you want to analyze here
de_census_data <- get_acs(geography = "tract",
  state = "DE",
  variables = demo_variables,
  geometry = TRUE,
  cb = TRUE)

OR

  1. Load de_census_data.RData
load("data/de_census_data_export.RData")

Clean

  • No need to do this if you used Choice 2
de_census_data_clean <- de_census_data %>%
    separate(col = NAME, into = c("Census_Tract", "County", "State"), sep = ",") %>%
    separate(col = Census_Tract, into = c("Census", "Tract", "Number"), sep = " ") %>%
    setnames(old=c( "Number"), new=c("Census_Tract_Number"))

Now let’s recreate that map we saw

Subset only Wilmington

# Wilmington Tracts - ask Eli where to find these numbers
Wilm_census_data <- de_census_data_clean %>%
  filter(Census_Tract_Number %in% c(2, 3, 4, 5, 6.01, 6.02, 9, 11, 12, 13, 14, 15, 16, 19.02, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30.02))

Let’s Plot EVERYTHING!

# Plot all our data
ggplot(Wilm_census_data, aes(fill = estimate)) +
  geom_sf() +
  scale_fill_viridis_c() +
  scale_color_viridis_c(guide = FALSE) +
  theme_minimal() +
  coord_sf(crs = 26916, datum = NA) +
  labs(title = "Estimates by Census Tract",
       subtitle = "Wilmington, DE",
       caption = "Data source: 2017 ACS.
       \nData acquired with the R tidycensus package.",
       fill = "ACS estimate") +
  facet_wrap(~variable)

What’s an issue here?

Looking back: Why was Tidy data useful

R being vectorized allows us to use commands like:

  • facet
  • group_by

Recreating Maps/Other Analysis

Next steps